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ABSTRACT 

TDP-43 is an important patliological protein tliat ag- 
gregates in tlie diseased neuronal cells and is linked 
to various neurodegenerative disorders. In normal 
cells, TDP-43 is primarily an RNA-binding protein; 
however, how the dimeric TDP-43 binds RNA via 
its two RNA recognition motifs, RRM1 and RRM2, 
is not clear. Here we report the crystal structure of 
human TDP-43 RRM1 in complex with a single- 
stranded DNA showing that RRM1 binds the 
nucleic acid extensively not only by the conserved 
p-sheet residues but also by the loop residues. 
Mutational and biochemical assays further reveal 
that both RRMs in TDP-43 dimers participate in 
binding of UG-rich RNA or TG-rich DNA with RRM1 
playing a dominant role and RRM2 playing a sup- 
porting role. Moreover, RRM1 of the amyotrophic 
lateral sclerosis-linked mutant D169G binds DNA 
as efficiently as the wild type; nevertheless, it is 
more resistant to thermal denaturation, suggesting 
that the resistance to degradation is likely linked to 
TDP-43 proteinopathies. Taken together all the data, 
we suggest a model showing that the two RRMs in 
each protomer of TDP-43 homodimer work together 
in RNA binding and thus the dimeric TDP-43 recog- 
nizes long clusters of UG-rich RNA to achieve high 
affinity and specificity. 

INTRODUCTION 

TDP-43 (TAR DNA-binding protein) is a DNA- and 
RNA-binding protein highly conserved in eukaryotes 
with multiple essential cellular functions in DNA tran- 
scription and RNA translation (1). Yet TDP-43 is abnor- 
mally aggregated forming inclusions in neuronal cells in 



amyotrophic lateral sclerosis (ALS) and frontotemporal 
lobar degeneration (FTLD) (2,3). TDP-43 inclusions 
have also been characterized in various neurodegenerative 
disorders, including Alzheimer's disease and other 
tauopathies and Lewy body disorders, suggesting a full 
spectrum of TDP-43 proteinopathies (4-6). Lines of 
evidence show that either the loss of TDP-43 normal 
cellular function or the gain of abnormal toxic function 
of TDP-43 inclusions may lead to neuronal cell death and 
disorders (7). To understand the normal cellular function 
of TDP-43, it is important to first elucidate how it selects 
and binds to its target RNA sequences for the regulation 
of various RNA metabohsm events. 

TDP-43 was first characterized as a DNA-binding 
protein bound to HlV-1 TAR DNA sequence for tran- 
scriptional repression (8). However, further studies have 
revealed that TDP-43 is primarily an RNA-binding 
protein with multiple roles in the regulation of niRNA 
splicing, translation and transportation, and it is also 
present in Drosha complex for micro RNA processing 
and in stress granules for protecting mRNAs in stress con- 
ditions (7,9). TDP-43 prefers to bind to UG-rich se- 
quences of single-stranded RNA as demonstrated by its 
binding at the UG-repeats near exon 9 for splicing silence 
of the human CFTR gene, which encodes cystic fibrosis 
transmembrane conductance regulator and is hnked to 
cystic fibrosis (10,11). The sequence preference of TDP- 
43 for binding at UG-repeats in the 3' or 5' splice sites 
of pre-mRNA transcripts, which promote exon skipping 
or inclusion, have also been observed in those of 
apolipoprotein All (12), eukaryotic translation termin- 
ation factor 1 (13), retinoid X receptor gamma (13), a 
breast cancer 1 -mutated substrate (13) and polymerase 
delta interacting protein/S6 kinase 1 Aly/REF-hke target 
(P0LD1P3/SKAR) (14). Nevertheless, TDP-43 also binds 
to non-UG sequences as shown by its binding to the non- 
UG repeats in the 3' UTR of its own mRNA for the 
autoregulation of TDP-43 levels (15). Genome-wide 
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RNA mapping further confirmed that TDP-43 prefers to 
bind UG-rich sequences in a large set of RNA transcripts 
and has a broad role in the regulation of alternative 
splicing and gene expression (16-21). In these studies, 
beside UG repeats, TDP-43 also prefers to bind to 
poly(A)n sequences (18), GC-rich sequences (18), a 
variant with an adenine in the middle (UG)nUA(UG),-n 
(17) and poly-pyriniidine rich sequences (16). 

TDP-43 is actually well equipped for RNA binding, as it 
contains two tandem RNA recognition motifs RRM 1 and 
RRM2, besides the N-terminal domain (NTD) and the 
C-terminal glycine-rich region (see Figure lA). RRM, 
also known as RNA binding domain (RBD) and 
ribonucleoprotein domain (RNP), is one of the most 
abundant protein domains in eukaryotes (22). RRM 
contains two highly conserved segments denoted as 
RNPl and RNP2 of eight and six amino acids, respect- 
ively. Typically, RRM has a fold of a P-sheet packed 
against two a-helices with the conserved aromatic/hydro- 
phobic residues in RNPl and RNP2 located in the p-sheet 
that stack with the bases and the sugar rings of the single- 
stranded RNA (23). Nevertheless, each RRM can interact 
with a minimum of two to a maximum of eight nucleotides 
and the interactions can be sequence specific or non- 
specific. Moreover, tandem repeats of RRM are fre- 
quently identified in RRM proteins and some of these 
proteins form ohgomers, such as TDP-43 forming a 
honiodimer with four RRMs (24-26). The interaction 
between RRMs and RNA is thus intricate, as each 
RRM may interact with RNA differently and tandem 
RRMs can be assembled in diverse ways (22). 
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Figure 1. Overall crystal structure of hRRMl-DNA complex. (A) 
Domain structure of TDP-43. (B) The molecular surface of liRRMl 
bound with a ssDNA. The difference Fourier (Fo-Fc) electron density 
map was superimposed on the structural model of DNA. (C) The 
overall crystal structure of hRRMl in complex with DNA. 



The crystal structure of mouse RRM2 (mRRM2) in 
complex with a single-stranded DNA (ssDNA) has been 
reported revealing that ssDNA is indeed bound at the flat 
surface of the P-slieet (25). To estabUsh the molecular 
basis for TDP-43 in binding RNA and DNA, here we 
report the crystal structure of human TDP-43 RRMl in 
complex with a ssDNA. Together with mutagenesis and 
biochemical assays, we show that both RRM 1 and RRM2 
participate in nucleic acid binding to achieve its high 
affinity and preference in binding UG-rich RNA or TG- 
rich DNA. A structure model of TDP-43 homodimer 
binding to RNA further suggests how TDP-43 binds to 
mRNA transcripts with long UG-repeats. 

MATERIALS AND METHODS 

Protein expression and purification 

The gene fragments encoding human TDP-43 RRMl 
domain (residues 101-191) was amplified by polymerase 
chain reaction using the primers of 5'-GGGGGATCCCA 
GAAAACATCCGATTTA-3' and 5'-GGGAAGCTTAT 
TTCTGCTTCTCAAAGGCTC-3'. The polymerase 
chain reaction products were digested with BamHI and 
Hindlll and then inserted into pQE30 expression vector 
(Qiagen) for the expression of the N-terminal His-tagged 
RRMl. The single-point mutants of hRRMl, W113A, 
T115A, F147L, F149L, D169G, D169A, R171A and 
N179A were generated by QuikChange® Site-Directed 
Mutagenesis Kit (Stratagene). 

All of the TDP-43 RRMl proteins were overexpressed 
in the Escherichia coli M15 strain. The ceU culture (40 ml) 
was incubated overnight and then added into LB (11) 
medium with 50-mg ampicillin and allowed to grow to a 
density yielding an ODgoo of ~0.5 at 37°C. The cell culture 
was cooled down to 20° C and protein expression was 
induced by adding 0.8 mM IPTG and 50 mg of ampicilhn 
for 22 h at 20°C. 

The cells were lysed by microfluidizer and centrifuged 
at 12000rpni for 30min. Cell extracts were loaded onto a 
Ni-NTA affinity column (Qiagen) and the recombinant 
His-tagged protein was eluted by a step gradient using 
50 mM phosphate buffer, 0.5 M imidazole, 100 mM 
NaCl and 10 mM P-mercaptoethanol at pH 7.5. Peak frac- 
tions were dialyzed against 50 mM phosphate buffer con- 
taining 50 mM imidazole and 10 mM P-mercaptoethanol 
at pH 7.5 and then applied to a SP column (GE 
Healthcare) and eluted with a step gradient of 50 mM 
phosphate, IM NaCl, 50 mM imidazole and lOmM 
p-mercaptoethanol at pH 7.5. For further biochemical 
assays, the purified protein sample was dialyzed against 
a buffer solution of 50 mM phosphate (pH 7.5), 50 mM 
imidazole and lOmM P-mercaptoethanol. 

Filter binding assay 

The 30-nt (TG)i5 DNA (lOpmol) were 5'-end labeled with 
[y-^2p]ATP by T4 PNK. The labeled DNA was then 
incubated with hRRMl for 30min at room temperature 
in the binding buffer of 50 mM phosphate, 50 mM imid- 
azole and 10 mM P-mercaptoethanol at pH 7.5. The 
mixture was filtered through a BA 85 nitrocellulose 
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membrane (Schleicher and SchueU) overlaid on a nylon 
membrane (Roche) in a 48-well slot blot apparatus (Bio- 
Rad). After extensive washing, the protein-DNA complex- 
bound nitrocellulose membrane and the DNA-bound 
nylon membrane were air dried and exposed to a ^"^^ 
RX-N Film (Fuji). The radioactive signals of labeled 
DNA were counted by UVP Biospectrum 600 Image 
System (UVP) and the affinity was calculated using the 
Logistic equation with three parameters. The dissociation 
constant values were deduced from the protein concen- 
trations at which half of the DNA substrates were bound 
with hRRMl. 

Circular dichroism 

The thermal denaturing melting points of TDP-43 were 
measured three times by a circular dichroism (CD) spec- 
trometer AVIV CD400. The CD spectra were scanned 
from 25 to 85°C at a wavelength of 208 nm and the 
melting point was estimated by AVIV program. The 
protein concentration was 0.3 nig/ml in a solution contain- 
ing 100 mM NaCl in 50 mM phosphate buffer (pH 7.5). 

Crystallization, structure determination and refinement 

The human RRMl used for crystallization was purified 
with a procedure slightly different from the one used for 
biochemical analysis. All the Tris-HCl buffer was 
replaced by phosphate buffered saline buffer at pH 7.9, 
and 1 mM DTT was added in the last purification step 
using a HiTrap heparin column. The eluted proteins 
were dialyzed against 1% glycerol and 20 mM Tris-HCl 
at pH 7.9, and concentrated to ~6mg/ml by Vivaspin 
ultrafiltration unit (Sartorius). The purified hRRMl was 
mixed with a Se-labeled ssDNA with a sequence of 
5'-GTTGAseGCGTT-3' in a one-to-one molar ratio. 
This selenium-labeled DNA had a Se-H attached at the 
2'-C of the sugar ring of adenine (SeNA Research Inc.). 
The crystals of hRRMl -DNA complex were grown by 
hanging drop vapor-diffusion method at room tempera- 
ture, by mixing 1 |il of protein-DNA solution with 1 |il of 
reservoir solution containing 0.12M CH3COONH4, 16% 
PEG 3350 and 0.05 M Bis-Tris at pH 5.5. 

X-ray diffraction data were collected at SPXF beamline 
BL13C1 at NSRRC (Taiwan) at -150°C. The data were 
processed and scaled by HKL2000 (27) and all of the dif- 
fraction statistics are listed in Table 1. The RRMl -DNA 
complex crystallized in the P6522 hexagonal space group, 
with one molecule per asymmetric unit. The structure of 
the complex was solved by single wavelength anomalous 
dispersion (SAD) and refined by PHENIX. The final 
model of hRRMl -DNA complex at 2.75 A resolution con- 
tains 91 amino acids (residues 101-191), 9nt (Gl to T9) 
and 14 water molecules. The data collection and refine- 
ment statistics are summarized in Table 1. 



RESULTS 

Overall crystal structure of hRRMl-DNA complex 

To reveal the molecular basis underlying the interactions 
between TDP-43 and nucleic acids, the N-terminal 



Table 1. Crystallographic statistics of hRRMl-DNA complex 



Values 


Data collection and processing 




Wavelength (A) 


0.97622 


Space group 


P6522 


Cell dimensions (A) 


a = b = 71.11, c = 101.63 


Resolution (A) 


2.75 


Observed reflections 


77 077 


TTnimip rpflpptioTTi 


4328 


Redundancy" 


17.8 (18.7) 


Completeness''( % ) 


100 (99.9) 


Rsym" 


7.8 (55.4) 


//a(/)" 


43.5 (8.6) 


R pfinpinpnt QtatiQtipQ 




Resolution range 


26.33-2.75 


Reflections (work/test) 


4301/430 


R-factor/R-free (%) 


20.65/25.86 


Nonhydrogen eitoms 




Prntpin 


816 


Solvent molecules 


14 


Model quality 




Root mean square deviations in 




Bond length (A) 


0.003 


Bond angle (") 


0.75 


Average B-factor (A^) 


54.99 


Ramachandran plot (%) 




Most favored 


98.67 


Additionally allowed 


1.33 


Generously allowed 


0 


Disallowed 


0 



"The last shell (2.85-2.75A) statistics are listed in parenthesis. 



His-tagged human RRMl (hRRMl, residues 101-191) 
was expressed in E. coli and purified by chromatographic 
methods for structural and biochemical studies. Various 
DNA and RNA with different sequences and lengths, 
including TG repeats and UG repeats, were used to co- 
crystallize with hRRMl, but only a single-stranded 10-nt 
DNA (5'-GTTGAseGCGTT-3') with a 2'-methylseleno- 
adenosine (Ase) could be co-crystallized with hRRMl. 
The hRRMl-DNA complex was crystallized in the hex- 
agonal space group P6522 with one hRRMl-DNA per 
asymmetric unit by the hanging-drop vapor diffusion 
method. The structure of the hRRMl-DNA complex 
was solved by SAD using X-ray diffraction data collected 
at Se-absorption edge by PHENIX. After auto-build of 
the protein peptide chain, the Fourier map (2Fo-Fc) 
revealed a clear continuous electron density, which could 
be fitted with 9 nt from Gl to T9 (See Figure IB). The final 
refined model contained one hRRMl polypeptide chain 
(residues 103-179) and one ssDNA (Gl to T9) with an 
R-factor/R-free of 20.65/25.86% for 4301/430 reflections 
up to a resolution of 2.75 A. 

The overall crystal structure of the hRRMl-DNA 
complex showed that hRRMl had an otp sandwich struc- 
ture containing a five-stranded P-sheet packed against two 
a-helices (see Figure IC). Similar to the mRRM2, the five- 
stranded P-sheet in hRRMl had a topology of p2-P3-pi- 
P5-P4 with the conserved RNP2 and RNPl segments 
located in pi and P3, respectively. However, different 
from mRRM2, hRRMl had a longer loop between p2 



Nucleic Acids Research, 2014, Vol. 42, No. 7 4715 



and P3, named Loop3. The ssDNA was bound on the flat 
surface of the P-sheet of hRRM 1 interacting mainly with 
RNP2 and RNPl segments. The 5'-end of the DNA 
extended upward and further interacted with the Loopl 
residues. 

Interactions between hRRMl and DNA 

The ssDNA bound primarily on the surface of the central 
P-sheet and the schematic diagram of the detailed inter- 
actions between hRRMl and DNA is shown in Figure 2A. 
The 3'-end nucleotides, C7-G8, interacted most exten- 
sively with the RNP segments in the P-sheet. Similar to 
the classical RRM proteins, the conserved 1107 in RNP2 
formed nonbonded interactions with C7 and the 
conserved F149 in RNPl stacked with G8 (see 
Figure 2 A and B). The cy to sine of C7 inserted into in a 
cleft on the hRRMl surface, not only stacking with 1107 
but also forming hydrogen bonds with N179 (see 
Figure 2C). The aromatic side chain of the conserved 
F147 in RNPl was inserted between the two sugar rings 
of C7 and G8. Besides stacking with F149, the G8 base 
also formed hydrogen bonds to hRRMl: Nl and N2 
formed hydrogen bonds with residue D105 (052); N2 
formed a hydrogen bond with Q134 (Osl) (Figure 2D). 
The G6 and A5 nucleotides did not stack with any amino 
acid residues but they formed hydrogen bonds with 
hRRMl: the 06 atom of G6 formed a hydrogen bond 
with K176 (NQ, and the Nl atom of A5se formed a 
hydrogen bond with K176 (NQ (Figure 2G). The exten- 
sive hydrogen-bonding networks surrounding G6, C7 and 
G8 likely specified the sequence preference for G-C-G at 
these three binding sites on the central P-sheet. 

The T3 and G4 nucleotides interacted extensively with 
Loopl of hRRMl. G4 inserted into the surface cleft, 
stacking with W113 (in Loop 1) and R171, and formed 
extensive hydrogen bonds with hRRMl: Nl and N2 to 
LI 11 (backbone O); 06 to W113 (backbone N); N2 to 
G146 (backbone O); N7 to R171 (Nr|l) (see Figure 2E 
and F). Moreover, T3 also stacked with W113 from 
Loopl, but this 71-71 stacking interaction probably cannot 
specify the preference for T3 at this position. In contrast, 
the extensive hydrogen bond network surrounding G4 
might specify the sequence preference for a G at this 
binding site. The long loop, Loop3, interacted with the 
DNA phosphate backbone of G8, suggesting that it 
might be crucial for the recognition of the shape but not 
the sequence of the ssDNA. In summary, hRRMl bound 
the ssDNA via nonbonded, 7i-7r stacking interactions and 
hydrogen bonding, and it might specify the preference for 
a GC-rich sequence: X-G4-X-G6-C7-G8. 

Mutation of the critical residues for nucleic acid binding 
and TDP-43 pathology in RRMl 

To verify that the interactions observed between hRRMl 
and DNA in the crystal structure were critical for nucleic 
acid binding in physiological conditions in solutions, we 
further constructed several single-point mutants in 
hRRMl, including W113A, T115A, F147L, F149L, 
D169G, D169A, R171A and N179A (Figure 3A). Five of 
the mutated residues participated in hRRMl -DNA 



interactions in the crystal structure: W113, R171 and N179 
in loop regions, and F147 and F149 in the RNPl segment. 
Moreover, we also constructed the D169G mutant because 
mutation of D169 to glycine in TDP-43 was reported to 
associate to sporadic ALS (28). D169 fonned a hydrogen 
bond to Tl 15, and therefore Tl 15A, as well as D169A, were 
also constructed for the comparison of its biochemical 
properties with those of D169G (see Figure 3D). 

The seven single-point His-tagged hRRM 1 mutants were 
expressed and purified and their DNA-binding affinities to 
a 30-nt ssDNA with tandem TG repeats [(TG)i5] were 
measured by nitrocellulose filter binding assays. The 
estimated dissociation constant (K^) for the wild-type 
hRRMl against (TG)i5 was 20.6 ± 1.8 nM (Figures 3B 
and C). The two loop mutants, W113A and R171A, had 
a 6-fold higher of 114.5 ± 8.7 and 124.2 ± 12.7nM, re- 
spectively. The two RNPl mutants, F147L and F149L, had 
a 3- to 4-fold higher of 71.2 ± 1.4 and 86.2 ± 7.7 nM, 
respectively. However, the N179A mutant only had a 
slightly higher of 25.9 ± 6.0 nM than that of the wild- 
type hRRMl. These results confirm that the interface 
residues observed in the crystal structure indeed contribute 
to the interactions between hRRMl and DNA in the low 
sah buffer at pH 7.5. 

The disease-related mutant D169G unexpectedly had a 
shghtly lower Kd of 14.2 ± 1.7 nM than that of the wild- 
type hRRMl (20.6 ± 1.8 nM). Similarly, D169A also had 
a slightly lower of 13.5 ± 2.1 nM, suggesting that 
mutation at D 169 to G or A could not impair the DNA 
binding activity of hRRMl. Moreover, T115A also had a 
retained of 23.7 ± 5.0 nM. Apart from the RRMl 
D169G mutant, we also constructed a D169G hN12 
mutant using the C-terminal tail-truncated hTDP-43 
(hN12, residues 1-259) as the template. We found that 
this D169G hN12 mutant bound RNA [(UG)i5] with a 
similar affinity (A^d = 5.5 ± 0.5 nM) as the wild-type 
hN12 (TiCd = 5.3 ± 1.2nM) (Figure 5A). This resuh 
further showed that D169G mutant retained its abihty in 
RNA binding. 

Further analysis by CD showed that hRRMl D169G 
had a similar CD spectrum as compared with that of the 
wild-type hRRMl, suggesting that D169G had a retained 
folded structure. It was thus intriguing why D169G bound 
DNA slightly better than wild-type RRMl because D169 
was not involved in the interactions between hRRMl and 
DNA. A further assay showed that D169G in fact had a 
higher thermal stability with a melting point of 60.6°C as 
compared with the 53.0°C melting point of wild-type 
hRRMl as monitored by CD at 208 nm (Figure 3E). 
Thus a single-point mutation of D169 to G had a 
profound effect that the melting point of hRRMl was 
increased by ~7°C. In summary, D169G hRRMl mutant 
had a retained overall structure with increased theimal sta- 
bility and bound single-stranded TG-repeats with a shghtly 
higher affinity as compared with the wild-type hRRMl. 

RRMl interacts more extensively with nucleic acids 
than RRM2 

The two RRMs in hTDP-43 share a high sequence identity 
of 22%; however, hRRMl is longer than hRRM2 due to 
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Figure 2. The interactions between hRRMl and DNA. (A) Schematic diagram of the interactions between hRRMl and DNA. 1107, F147 and F149 
are the conserved aromatic/hydrophobic residues in RNP2 and RNPl segments that interact with DNA bases and sugar rings of C7-G8. (B) The 
cytosine of C7 stacks with the side chains of 1107 and N179. (C) C7 forms hydrogen bonds to N179. (D) G8 forms hydrogen bonds with D105 and 
Q134. (E) G4 stacks between R179 and W113, whereas T3 stacks with W113. (F) G4 hydrogen bonds with Llll, W113 and G146. (G) G6 hydrogen 
bonds with K176. 
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Figure 3. Mutation of the critical residues for nucleic acid binding and TDP-43 pathogenesis in hRRMI. (A) Mutation of a number of residues 
(indicated by arrows) in hRRMI. (B) The binding affinity between the wild-type hRRMI and (TG)i5 was measured by the nitrocellulose filter 
binding assay. The 5'-end ^^-labeled DNA (lOpmol) was incubated with hRRMI (0.0002-300 |.iM) and the hRRMl-DNA complexes trapped in the 
nitrocellulose filters were quantified. (C) The apparent A^j between hRRMI mutants and (TG)]5. (D) D169 located in Loop6 between P4 and p5 
forms a hydrogen bond to the side chain of T115 located in Loopl. (E) The CD spectra of hRRMI and hRRMl-D169G mutant. (F) The thermal 
melting points of hRRMI and hRRMI -D169G mutant were estimated by CD at a wavelength of 208 nm. 



longer loops (see Figure 4A). The crystal structure of 
niRRM2 in coinplex with the ssDNA was reported 
(PDB: 3D2W) that had an identical sequence to the one 
in the hRRMl-DNA complex. The comparison of the 
protein-nucleic acid interactions between hRRMI and 
mRRM2 showed that most of the residues that interacted 
with nucleic acids were located within or next to the 
P-sheet (blue arrows) and the rest of interacting residues 
were located in the Loopl and Loop3 regions (red arrows 
in Figure 4A). The hRRMI interacted more extensively 
with ssDNA than niRRM2 (9 versus 12 arrows) because 
more residues in hRRMI participated in the interactions. 
These results are consistent with the previous binding 
assays showing that RRMl plays a more critical role in 
the nucleic acid binding than RRM2 (10). 

A close examination between the two complex struc- 
tures revealed that the p-sheet residues in mRRM2 
mainly interacted with T3-G4, and in contrast, the 
P-sheet residues in hRRMI interacted with C7-G8. 
Moreover, hRRMI had extra interactions with T3-G4 
via its loop residues of LI 11 and W113. The stacking 



between T3/W113/G4/R171 was only observed in the 
hRRMl-DNA complex because Loopl in mRRM2 was 
located more distantly from the DNA and the ahgned 
E200 and 1253 in the same position either pointed away 
or was located too far away from DNA (Figure 4B). The 
molecular surface of Loopl of mRRM2 clearly showed an 
acidic surface that is not suitable for binding RNA/DNA 
(bottom panel in Figure 4B). On the other hand, the mu- 
tations of W113 and R171 to alanine generated defective 
hRRMI mutants in DNA binding, suggesting that the 
interactions around RRMl Loopl region were critical 
for the DNA binding activity of TDP-43. In summary, 
our results confirm that RRMl of TDP-43 interacts 
more extensively with nucleic acids and that RRMl 
prefers binding to G4-X-G6-C7-G8 sequence, whereas 
RRM2 prefers binding to T3-G4 sequence. 

TDP-43 binds TG-rich DNA and UG-rich RNA with a 
high affinity 

To determine how TDP-43 binds DNA and RNA via its 
RRMs, a number of human TDP-43 mutants were 
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Mouse RRM2 192 — -KVFVGRCIEDMTAEELQQFFCQYGEWDVFIPKPFR AFAFVIFADDKVAQSLCGEDLIIKGISV HISNAEP-KHN 265 

n\ t tt m 



^Residues involved in loops 
^Residues involved in p-sheet surface 



B 



ssDNA bound to hRRM1 




Figure 4. hRRMl interacts more extensively with DNA, as revealed by the comparison between the crystal structures of hRRMl-DNA and 
niRRM2-DNA. (A) Sequence alignment of human and mouse RRM domains. Blue and red arrows mark the residues that interact with ssDNA 
via nonbonded interactions or hydrogen bonding. Red arrows are located in the loops and blue arrows are located within or close to p-strands. (B) 
The crystal structure of hRRMl-DNA (green and purple) is superimposed on mRRM2-DNA (gray and black) complex. A close look in the right top 
panel shows that only Loopl of hRRMl interacts with DNA: W113/R171 stacking with T3/G4. On the other hand, the Loopl of RRM2 does not 
interact with DNA because the molecular surface of Loopl is acidic and not suitable for DNA binding as shown in the right bottom panel. 



constructed using the C-terminal tail-truncated liTDP-43 
(hN12, residues 1-259) as the template that had the NTD, 
RRMl and RRM2, but not the C-terminal tail. Single- and 
double-point mutants in RRM 1 were constructed with mu- 
tations in the RNP2 segment, including hN12-F147A, 
hN12-F149A and hN12-F147A-F149A. Similarly, single- 
and double-point mutants in RRM2 in the aligned RNP2 
segment were constructed, including hN12-F229A, hN12- 
F231A and liN12-F229A-F231A. The DNA-bindiiig and 
RNA-binding affinities of these mutants to (TG)i5 and 
(UG)i5 were measured by nitrocellulose filter binding assays. 

The wild-type hN12 bound tightly to both single- 
stranded RNA and DNA with a of 5.3 ± 1.2 iiM for 
(UG)i5 and 5.9±0.8nM for (TG)i5 (Figure 5A). 



Mutations in RRMl largely reduced the binding affinity 
by ~50-fold for UG-repeated RNA and ~20-fold for TG- 
repeated DNA, suggesting that the mutated phenylalanine 
residues play a major role in both RNA and DNA binding 
in TDP-43. Mutations in RRM2 also reduced the binding 
affinity but only ~2-fold for UG-repeated RNA and 2- to 
8-fold for TG-repeated DNA. These results suggest that 
RRM 1 of hTDP-43 plays a more dominant role in nucleic 
acid binding, whereas RRM2 also participates in the inter- 
actions. Moreover, this result suggests that TDP-43 likely 
binds to single-stranded DNA and RNA in a similar 
binding mode and therefore mutations in TDP-43 
produces similar effects of reduced binding for DNA 
and RNA. 
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Apparent Kd between truncated TDP-43 (hN12) mutants 

and (UG)i5 single-stranded RNA or (TG)i5 single-stranded DNA 

Kd(nM) 





RNA(UG),5 


DNA (TG),5 


hN12 


5.3 ±1.2 


5.9 ± 0.8 


hN12-D169G 


5.5 ±0.5 


N.D.* 


hN12-F147A 
RRM1 hN12-F149A 

hN12-F147AF149A 


301.6 ±41.4 
343.9 ±48.8 
296.0 ±27.2 


113.1 ±7.4 
94.6 ±3.7 
172.3 ±1.6 


hN12-F229A 
RRM2 hN12-F231A 

hN12-F229AF231A 


8.9 ±1.5 
10.9 ±2.2 
11.9 ±1.9 


11.5 ±2.3 
16.9 ±0.4 
47.1 ±4.7 


*; Not determined. 
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(UG)6 


(CA)6 



OGUGUGUGUGUG CACAC ACACACA 
— + — 



hN12-RNA 

complex 

Free RNA 
Bound fraction(%) 67.9±1 .9 




0.6±1.0 



PS1 PS2 

AGAGUGAAAAUG CGCGUGCCCCUG 



hN12-RNA 
complex 

Free RNA 




Bound fraction(%) 23.1 ±0.8 



6.2±0.8 



Figure 5. Mutations in either RRMl or RRM2 reduce the TDP-43 
high-affinity binding for UG-repeated RNA or TG-repeated DNA. 
(A) The C-terminal truncated TDP-43 (hN12, residues 1-259) was 
incubated with 5'-end "^"P-labeled single-stranded (UG)i5 RNA or 
(TG)i5 DNA (lOpmol) for the measurement of binding affinity by 
the nitrocellulose filter binding assay. The estimated apparent K^^ 
between the wild-type and mutated hN12 show that mutations in 
both RRMl and RRM2 generated defective mutants in RNA and 
DNA binding. (B) The nitrocellulose filter binding assays reveal 
sequence specificity of TDP-43 for UG repeats and UG-rich sequences. 
The hN12 (2 |.iM) bound to (UG)6 with the highest affinity (estimated 
bound fraction: 67.9 ± 1.9%), the UG-rich PSl and PS2 with moderate 
affinity (23.1 ± 0.8 and 6.2 ± 0.8%) and (CA)^ with the lowest affinity 
(0.6 ± 1.0%). 



To further confirm the specific interactions for UG- 
rich sequences, different sequences of 12-nt RNA were 
synthesized for binding assays, including a UG repeat 
(UG)6, two UG-rich putative sites PSl (AG A 
GUGAAAAUG) and PS2 (CGC GUG CCCCUG) and 
a putative low-affinity sequence (CA)(3. TDP-43 N12 
bound to (UG)6 with the highest affinity (estimated 
bound fraction: 67.9 ± 1.9%) and bound to PSl 
(23.1 ± 0.8%) and PS2 {6.1 ± 0.8%) with moderate 
affinity, whereas it did not bind to (CA)^ 
(0.6 ± 1.0%) (see Figure 5B). This result suggests that 
TDP-43 indeed has specificity for UG repeats and UG- 
rich sequences. 



DISCUSSION 

How TDP-43 binds RNA with tandem UG repeats 

RNA-recognition motif (RRM) is the most abundant 
RBD in higher vertebrates with an estiination that 
RRMs are present in 0.5-1% of human gene products 
(29). RRM-containing proteins participate in various 
posttranscriptional RNA processing, such as mRNA 
splicing, editing, export, stability regulation and 
turnover (30). Most of the eukaryotic RRM-containing 
proteins bear multiple copies of RRM so that they can 
bind a long stretch of RNA with high affinity and/or 
sequence specificity, with examples including hnRNPAl 
(2 RRMs), Sex-lethal (2 RRMs), FIR (2 RRMs), Prp24 
(3 RRMs), PABP (4 RRMs) and PTE (4 RRMs) (31-36). 
Moreover, RRM proteins often form homodimers, and 
these dimers contain double the copies of RRMs for 
RNA binding. All of the RRM structures reported to 
date show that RRM binds single-stranded RNA or 
DNA on the P-sheet surface interacting primarily with 
the conserved aromatic/hydrophobic residues in RNPl 
and RNP2 seginents. 

Similar to other RRM proteins, TDP-43 contains 
multiple copies of RRMs and forms a homodimer, and 
as a result, the dimeric TDP-43 has four copies of RRMs. 
It has been suggested that TDP-43 forms a dimer through 
the interactions between its NTDs (24,26,37). Our binding 
assays show that the dimeric C-terminal truncated TDP- 
43 (NI2, residues 1-259) binds single-stranded (UG)i5 
repeats with a high affinity and an estimated dissociation 
constant of 5.3 nM. Our mutational results further reveal 
that the binding affinity between TDP-43 (N12) and 
ssRNA is greatly reduced by mutations in RRMl and 
moderately reduced by mutations in RRM2, suggesting 
that RRMl plays a more dominant role and that RRM2 
also participates in the interactions but plays only a sup- 
porting role in RNA binding. This binding result is con- 
sistent with the crystal structures showing that RRMl 
interacts more extensively with the bound DNA. The 
high binding affinity between TDP-43 and (UG)i5 could 
be due to the contribution from the multiple copies of 
RRMs. 

How does TDP-43 bind long clusters of UG-rich se- 
quences in pre-mRNA transcripts? Previously, we 
determined the low-resolution structure of TDP-43 N12 
by the smaU angle X-ray scattering (SAXS) (37). The 
SAXS structure envelope clearly reveals an elongated con- 
formation and one half of the envelope can be fitted with 
three domains: NTD, RRMl and RRM2 (see Figure 6A). 
Previous studies reveal that TDP-43 forms a homodimer 
via its NTD, and therefore NTDs are suggested to locate 
in the middle with two RRM2 domains flanking outward 
at the two sides of the elongated envelope. It is difficult to 
predict how RRMl and RRM2 are assembled in TDP-43 
because RRM can be associated with each other in diverse 
ways (22). However, we found that RRMl and RRM2 
can be oriented in a way in the SAXS envelope that the 
two bound ssDNAs can form a continuous 5'-3' strand as 
it extends from the P-slieet surface of RRMl to that of 
RRM2 (Figure 6). 
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Figure 6. The structural model of TDP-43 bound to single-stranded 
nucleic acids. (A) The SAXS envelope of TDP-43 dimer fitted with 
the crystal structure of hRRMl-DNA and mRRM2-DNA in the orien- 
tation that the DNA forms a continuous 5'-3' strand, as it is bound 
from hRRMl to mRRM2 in TDP-43. (B) The putative RNA binding 
sequence of TDP-43 is derived from the sequence in hRRMl-DNA and 
mRRM2-DNA complexes. (C) TDP-43 homodimer likely binds to a 
long UG-rich RNA via its RRMl and RRM2 domains. 



Several in vitro studies showed that TDP-43 preferen- 
tially binds the TG/UG dinucleotide repeat element 
(10,17,18,38). In the crystal structure of hRRMl-DNA 
complex, Loopl residues recognize the T3-G4 dinucleo- 
tide, whereas the P-sheet interacts with G6-C7-G8. The 
C7 can be replaced by T7 that can also fit snugly and 
form a hydrogen bond with N179. Therefore, it appears 
that RRMl can specifically recognize RNA with a 
sequence of X-G-X-G-U/C-G (see Figure 6B). On the 
other hand, mRRM2 only recognizes a T3-G4 via the P- 
sheet residues. Based on the model of TDP-43-RNA 
complex, we thus generate the RNA sequence that is 
Ukely recognized by TDP-43: X-G-X-G-U/C-G-X-X-X- 
U-G (see Figure 6B). This putative sequence showed 
that it is not only limited to UG repeats but can accom- 
modate some other sequences in between the UG se- 
quences, consistent with the TDP-43 binding sites that 
are not only limited but rich in UG repeats. 

Interestingly, the solution structure of TDP-43 RRMl- 
RRM2 (residues 102-269) in complex with a single- 
stranded RNA has been reported recently showing a 
similar arrangement of RRMl and RRM2 bound to the 
extended RNA from 5' to 3' end, as that in our model (39). 
The interactions between TDP-43 RRM1-RRM2 and 
RNA also share similar sequence specificity as we 
observed in the crystal structures with the RRMl 



specifically recognizing G1-N-G3-U4-G5 and RRM2 
recognizing U8-G9 in the 12-nt ssRNA with a sequence 
of 5'-Gl-U2-ra-U4-G5-A6-A7-U8-G9-A10-Al 1-U12-3'. 
The similar specific interactions observed in the NMR 
solution and X-ray crystal structures suggest that TDP- 
43 has a preference and similar binding mode for the UG- 
rich and TG-rich sequences. 

It remains unclear if all of the four RRMs interact with 
one strand or separate strands of RNA, or only the two 
RRMl in TDP-43 dominate the interactions. Our 
previous data and other studies reported that at least 6 
UG repeats were required for TDP-43 high-affinity 
binding and longer UG repeats were bound by TDP-43 
with higher affinities (10,25). The cross-linking- 
immunoprecipitation sequencing studies also revealed 
long UG-rich sequences of >100 nt (16,19,20). Taken 
together this suggests that TDP-43 not only binds UG 
repeats via its RRMl and RRM2, and both protomers 
might interact with a long stretch of RNA to increase 
the binding affinity and specificity (see the model in 
Figure 6C). It is also possible that the long RNA with 
multiple binding regions accommodates more than one 
TDP-43 for interactions. 

Disease-related mutation of D169G in TDP-43 

Numerous mutations have been identified in TDP-43 that 
are linked to ALS and FTLD (6). These mutations are 
mostly located in the C-terminal glycine-rich tail with 
the exception of D169G, which is the only mutation 
located in RRMl (28). In the crystal structure of 
hRRMl-DNA complex, D169 is located in Loop6 
between P4 and ps, and it hydrogen bonds to the side 
chain of T115 located in Loopl between pi and a I 
(Figure 3C). This interaction thus appears critical 
and may stabilize the structure of Loopl and Loop6. 
However, unexpectedly, we found that D169G hRRMl 
mutant had a retained overall structure and binds single- 
stranded (TG)i5 with a slightly higher affinity 
than the wild-type hRRMl. These results suggest that 
the disease-linked D169G mutation in TDP-43 did 
not produce any defect in protein folding or RNA 
binding. 

The only significant biochemical phenotype of D169G 
we observed here is that it was more resistant to thermal 
denaturation with a melting temperature of ~7°C higher 
than that of wild-type hRRMl as monitored by CD. The 
unique features of TDP-43 proteinopathies include its 
mislocation and aggregation in cytoplasm. The mislocated 
cytoplasmic TDP-43 can be degraded through the ubiqui- 
tin-proteasonie and autophagosome-mediated degrad- 
ation systems (40). It has been shown that D169G had a 
reduced binding and a decreased co-aggregation with 
UBQLN, suggesting that D169G mutant is degraded by 
the proteasome less efficiently (41). Alternatively, it has 
been shown that D169 is located in one of the potential 
caspase cleavage sites (DXXD) of TDP-43, and D169G is 
a cleavage-resistant mutant. The mutation of D169G thus 
results in a higher level of the full-length TDP-43 with 
stronger death-inducing activity in the cytoplasm (42). 
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Here we further provide a solid line of evidence showing 
that D169G hRRMl mutant is more thermal stable than 
the wild-type hRRMl. Hence, D169G mutant is probably 
not only more resistant to proteasome digestion and 
caspase cleavage, but it is more stable with a longer 
half-hfe than that of wild-type TDP-43. It has been 
shown that ALS-hnked mutations exhibit longer protein 
half-lives, about 12 h for wild-type TDP-43 and 24-^8 h 
for ALS-Hnked mutants (43). Therefore, it is likely that 
the increased thermal stabihty of D169G is correlated to 
its resistance to degradation and a higher level of TDP-43 
for eliciting TDP-43 proteinopathies. 

CONCLUSION 

This study reveals how the RRMl domain of TDP-43 
interacts extensively with the ssDNA via the p-sheet and 
loop residues at atomic level. Taken together the struc- 
tural, biochemical and mutational results, we conclude 
that both RRMs in TDP-43 participate in binding of a 
long stretch of nucleic acids with RRMl playing a 
dominate role and RRM2 playing a supporting role. 
TDP-43 thus binds long clusters of UG-rich RNA via its 
four RRM domains in the two protomers to achieve high 
affinity and specificity. 
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